Intrinsic Dimensionality Estimation in Visualizing Toxicity Data

نویسندگان

  • Natalia Kireeva
  • Svetlana I. Ovchinnikova
چکیده

Over the years, a number of dimensionality reduction techniques have been proposed and used in chemo informatics to perform nonlinear mappings. Nevertheless, data visualization techniques can be efficiently applied for dimensionality reduction mainly in a case if the data are not really high-dimensional and can be represented as a nonlinear low-dimensional manifold when it is possible to reduce dimensionality without significant information loss. In this study several intrinsic dimensionality estimation approaches have been investigated: the Geodesic Minimum Spanning Tree, the Eigen value-based and the Maximum Likelihood Estimators. Their performance has been compared for visualizing toxicity data in different descriptor spaces. INTRODUCTION Over the years, a number of dimensionality reduction techniques have been proposed and used in chemo informatics to perform nonlinear mappings. Nevertheless, data visualization techniques can be efficiently applied for dimensionality reduction mainly in a case if the data are not really high-dimensional and can be represented as a nonlinear low-dimensional manifold when it is possible to reduce dimensionality without significant information loss [1]. In this study several intrinsic dimensionality estimation [2] approaches have been investigated: the Geodesic Minimum Spanning Tree [3], the Eigen value-based [4,5] and the Maximum Likelihood Estimators [1]. Their performance has been compared for visualizing toxicity data in different descriptor spaces. The obtained values of data intrinsic dimensionality (ID) were compared with the quantitative results of data visualization for two applied dimensionality reduction approaches: Diffusion maps and Isomap. MATERIALS AND METHODS For intrinsic dimensionality estimation and dimensionality reduction the implementations provided by Matlab Toolbox for Dimensionality Reduction (v 0.7.1b) [6] were used. Intrinsic dimensionality estimators The intrinsic dimensionality of the data can be defined as the minimal number of variables needed to describe the data x. The intrinsic dimensionality estimators can be related to two main categories: the eigen value or projection methods and the geometric methods. Eigen value methods are based on principal component analysis (PCA) [7]. PCA projects the data along the directions of maximal variance. It computes eigen values and eigenvectors of the covariance matrix of data. Intrinsic Dimensionality (ID) is defined by the number of eigen values that exceed a predefined value of threshold. The geometric methods are mostly based on fractal dimensions or nearest neighbor distances. In this study, the Geodesic Minimum Spanning Tree [3] and Maximum Likelihood Estimator [1] were used as representatives of second group of methods. In Geodesic Minimum Spanning Tree (GMST) several steps are considered. First, a complete graph based on geodesic distances between all pairs of data points is built. A minimal spanning graph, or the GMST, is obtained by the reduction of the initial graph to a subgraph, in which every data point xi is connected to its k nearest neighbors. The intrinsic dimension is estimated from the GMST length functional L:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matlab Toolbox for Dimensionality Reduction

The demonstration presents the Matlab Toolbox for Dimensionality Reduction. The toolbox is publicly available and contains implementations of virtually all state-of-the-art techniques for dimensionality reduction and intrinsic dimensionality estimation. It provides implementations of 27 techniques for dimensionality reduction, 6 techniques for intrinsic dimensionality estimation, and additional...

متن کامل

Intrinsic Dimensionality Estimation With Optimally Topology Preserving Maps

A new method for analyzing the intrinsic dimensionality (ID) of low dimensional manifolds in high dimensional feature spaces is presented. The basic idea is to rst extract a low-dimensional representation that captures the intrinsic topological structure of the input data and then to analyze this representation, i.e. estimate the intrinsic dimensionality. More speciically, the representation we...

متن کامل

ider: Intrinsic Dimension Estimation with R

Abstract In many data analyses, the dimensionality of the observed data is high while its intrinsic dimension remains quite low. Estimating the intrinsic dimension of an observed dataset is an essential preliminary step for dimensionality reduction, manifold learning, and visualization. This paper introduces an R package, named ider, that implements eight intrinsic dimension estimation methods,...

متن کامل

Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey

The estimation of intrinsic dimensionality of high-dimensional data still remains a challenging issue. Various approaches to interpret and estimate the intrinsic dimensionality are developed. Referring to the following two classifications of estimators of the intrinsic dimensionality – local/global estimators and projection techniques/geometric approaches – we focus on the fractalbased methods ...

متن کامل

انجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی

Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014